Moving Large Data Sets Over High-Performance Long Distance Networks
نویسندگان
چکیده
In this project we look at the performance characteristics of three tools used to move large data sets over dedicated long distance networking infrastructure. Although performance studies of wide area networks have been a frequent topic of interest, performance analyses have tended to focus on network latency characteristics and peak throughput using network traffic generators. In this study we instead perform an end-to-end long distance networking analysis that includes reading large data sets from a source file system and committing large data sets to a destination file system. An evaluation of end-to-end data movement is also an evaluation of the system configurations employed and the tools used to move the data. For this paper, we have built several storage platforms and connected them with a high performance long distance network configuration. We use these systems to analyze the capabilities of three data movement tools: BBcp, GridFTP, and XDD. Our studies demonstrate that existing data movement tools do not provide efficient performance levels or exercise the storage devices in their highest performance modes. We describe the device information required to achieve high levels of I/O performance and discuss how this data is applicable in use cases beyond data movement performance.
منابع مشابه
Using a Fuzzy Auto Regressive Integrated Moving Average Model for Exchange Rate Forecasting
Forecasting models have wide applications in decision making. In the real world, rapid changes normally take place in different areas, specifically in financial markets. Collecting the required data is a main problem for forecasters in such unstable environments. Forecasting methods such as Auto Regressive Integrated Moving Average (ARIMA) models and also Artificial Neural Networks (ANNs) need ...
متن کاملUsing a Fuzzy Auto Regressive Integrated Moving Average Model for Exchange Rate Forecasting
Forecasting models have wide applications in decision making. In the real world, rapid changes normally take place in different areas, specifically in financial markets. Collecting the required data is a main problem for forecasters in such unstable environments. Forecasting methods such as Auto Regressive Integrated Moving Average (ARIMA) models and also Artificial Neural Networks (ANNs) need ...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملCommunication issues within high performance computing grids
This paper presents several ideas pertaining to desirable properties associated with protocols for moving large amounts of data over a grid of high performance computers. The protocol enhancements are discussed in terms of their scalability and identifying sources of potential protocol redundancy. In the context of this paper it is scalability as it relates to the transport of increasingly larg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011